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Abstract 

I examine how terminological languages 
can be used to manage linguistic data dur- 
ing NL research and development. In par- 
ticular, I consider the lexical semantics task 
of characterizing semantic verb classes and 
show how the language can be extended to 
flag inconsistencies in verb class definitions, 



identify the need for new verb classes, and 



identify appropriate linguistic hypotheses 

for a new verb's behavior. 



1 Introduction 

Problems with consistency and completeness can 
arise when writing a wide-coverage grammar or an- 
alyzing lexical data since both tasks involve working 
with large amounts of data. Since terminological 
knowledge representation languages have been valu- 
able for managing data in other applications such 
as a software information system that manages a 



large knowledge base of plans (Devanbu and Lit 



man, 1991), it is worthwhile considering how these 



languages can be used in linguistic data management 
tasks. In addition to inheritance, terminological sys- 
tems provide a criterial semantics for links and auto- 
matic classification which inserts a new concept into 
a taxonomy so that it directly links to co ncepts more 
general than it a nd more specific than it ( Woods and 



Schmolze, 199J ) 



Terminological languages have been use d in NLP 
applications for lexical representation ( Burkert 



1995), and gram mar representation ( Brachman and 



Schmolze, 1991), and to assist in the acquisition 



and maintenance of domain specific lexical seman- 
tics knowledge ( Ayuso ct al., 1987 ). Here I explore 
additional linguistic data management tasks. In par- 
ticular I examine how a terminological language such 
as Classic ( Brachman et al., 1991 ) can assist a lexi- 
cal semanticist with the management of verb classes. 



In conclusion, I discuss ways in which terminological 
languages can be used during grammar writing. 

Consider the tasks that confront a lexical seman- 
ticist. The regular participation of verbs belonging 
to a particular semantic class in a limited number 
of syntactic alternations is crucial in lexical seman- 
tics. A popular research direction assumes that the 
syntactic behavior of a verb is systematically influ- 



enced by its meaning (Levin, 1993; Hale and Keyser 



1987) and that any set of verbs whose members pat- 



tern together with respect to syntactic alternations 



should form a semantically coherent class (Levin 
1993| ). Once such a class is identified, the mean- 



ing component that the member verbs share can be 
identified. This gives further insight into lexical rep- 



resentation for the words in the class (Levin, 1993). 

Terminological languages can support three im- 
portant functions in this domain. First, the process 
of representing the system in a taxonomic logic can 
serve as a check on the rigor and precision of the 
original account. Once the account is represented, 
the terminological system can flag inconsistencies. 
Second, the classifier can identify an existing verb 
class that might explain an unassigned verb's be- 
havior. That is, given a set of syntactically ana- 
lyzed sentences that exemplify the syntactic alterna- 
tions allowed and disallowed for that verb, the clas- 
sifier will provide appropriate linguistic hypotheses. 
Third, the classifier can identify the need for new 
verb classes by flagging verbs that are not mem- 
bers of any existing, defined verb classes. Together, 
these functions provide tools for the lexical seman- 
ticist that are potentially very useful. 

The second and third of these three functions can 
be provided in two steps: (1) classifying each alter- 
nation for a particular verb according to the type of 
semantic mapping allowed for the verb and its argu- 
ments; and (2) either identifying the verb class that 
has the given pattern of classified alternations or us- 
ing the pattern to form the definition of a new verb 



class. 

2 Sentence Classification 

The usual practice in investigating the alternation 
patterning of a verb is to construct example sen- 
tences in which simple, illustrative noun phrases are 
used as arguments of a verb. The sentences in (1) 
exemplify two familiar alternations of give. 

(1) a. John gave Mary a book 

b. John gave a book to Mary. 

Such sentences exemplify an alternation that be- 
longs to the alternation pattern of their verb.f] I will 
call this the alternation type of the test sentence. 

To determine the alternation type of a test sen- 
tence, the sentence must be syntactically analyzed 
so that its grammatical functions (e.g. subject, ob- 
ject) are marked. Then, given semantic feature in- 
formation about the words filling those grammatical 
functions (GFs), and information about the possible 
argument structures for the verb in the sentence and 
the semantic feature restrictions on these arguments, 
it is possible to find the argument structures appro- 
priate to the input sentence. Consider the sentences 
and descriptions shown below for pour: 

(2) a. [Marysubj] poured [Tina f,j] [a glass of milkio]. 

b. [Marys'] poured [a glass of milk i,j] for 
[Tina ppo ]. 

pourv- subj -» agent [volitional] 

obj -» recipient [ vo utional] 

io -> patient [ Uquid ] 
pouri: subj — > agent [volitional] 

obj -> patient [i iguid ] 

ppo — t recipient [volitional] 
Given the semantic type restrictions and the GFs, 
pour\ describes (2a) and pour 2, (2b). The mapping 
from the GFs to the appropriate argument structure 
is similar to lexical rules in the LFG syntactic theory 
except that here I semantically type the arguments. 
To indicate the alternation types for these sentences, 
I call sentence (2a) a benefactive-ditransitive and 
sentence (2b) a benefactive-transitive. 

Classifying a sentence by its alternation type 
requires linguistic and world knowledge. World 
knowledge is used in the definitions of nouns and 
verbs in the lexicon and describes high-level enti- 
ties, such as events, and animate and inanimate 
objects. Properties (such as liquid) are used to 
define specialized entities. For example, the prop- 
erty NON-CONSUMABLE (SMALL CAPITALS indicate 

x In the examples that I will consider, and in most 
examples used by linguists to test alternation patterns, 
there will only be one verb; this is the verb to be tested. 



Classic concepts in my implementation) specializes 
a liquid-entity to define PAINT and distinguish it 
from WATER, which has the property that it is CON- 
SUMABLE. Specialized event entities are used in 
the definition of verbs in the lexicon and represent 
the argument structures for the verbs. 

The linguistic knowledge needed to support sen- 
tence classification includes the definitions of (1) 
verb types such as intransitive, transitive and di- 
transitive; (2) verb definitions; and (3) concepts that 
define the links between the GFs and verb argument 
structures as represented by events. 

Verb types (subcategorizations) are defined 
according to the GFs found in the sentence. For 
example, (2a) classifies as ditransitive and (2b) 
as a specialized transitive with a PP. Once the 
verb type is identified, verb definitions (verbs) are 
needed to provide the argument structures. A VERB 
can have multiple senses which are instances of 
events, for example the verb "pour" can have the 
senses pour or prepare, with the required arguments 
shown belowj^j Note that pour\ and pour2 in (2) are 
subcategorizations of prepare, 
pour: pourer^oHttonoJ] 

pOU.rGG^ riari j ma ^ e _ container] 
pOUTCd [inanimate — substance] 

prepare: preparer [t , oHtionai] 
preparee [H?tlid ] 
prepared[„ oHtionai ] 

For a sentence to classify as a particular alterna- 
tion, a legal linking must exist between an event 
and the SUBCATEGORIZATION. Linking involves re- 
stricting the fillers of the GFs in the SUBCATEGO- 
RIZATION to be the same as the arguments in an 
event. In Classic, the same-as restriction is lim- 
ited so that either both attributes must be filled al- 
ready with the same instance or the concept must 
already be known as a legal-linking. Because of 
this I created a test (written in LISP) to identify a 
legal-linking. The test inputs are the sentence 
predicate and GF fillers arranged in the order of the 
event arguments against which they are to be tested. 
A linking is legal when at least one of the events as- 
sociated with the verb can be linked in the indicated 
way, and all the required arguments are filled. 

Once a sentence passes the linking test, and clas- 
sifies as a particular alternation, a rule associated 
with the alternation classifies it as a specializa- 
tion of the concept. This causes the event argu- 
ments to be filled with the appropriate GF fillers 
from the subcategorization. A side-effect of the 
alternation classification is that the event classifies 

2 For generality in the implementation, I use argi . . . 
arg n for all event definitions instead of agent . . . patient 
or preparer . . . preparee. 



as a specialized event and indicates which sense of 
the verb is used in the sentence. 

3 Semantic Class Classification 

The semantic class of the verb can be identified once 
the example sentences are classified by their alterna- 
tion type. Specialized VERB-CLASSes are defined by 
their good and bad alternations. Note that VERB 
defines one verb whereas verb-CLASS describes a 
set of verbs (e.g. spray/load class). Which al- 
ternations are associated with a verb-class is a 
matter of linguistic evidence; the linguist discovers 
these associations by testing examples for grammat- 
icality. To assist in this task, I provide two tests, 
have-instances-of and have-no-instances-of . 
The have-instances-of test for an alternation 
searches a corpus of good sentences or bad sen- 
tences and tests whether at least one instance of the 
specified alternation, for example a benefactive- 
ditransitive, is present. 

A bad sentence with all the required verb ar- 
guments will classify as an alternation despite 
the ungrammatical syntactic realization, while a 
bad sentence with missing required arguments will 
only classify as a SUBCATEGORIZATION. The 
have-no-instances-of test for a SUBCATEGORIZA- 
TION searches a corpus of bad sentences and tests 
whether at least one instance of the specified 

SUBCATEGORIZATION, for example TRANSITIVE, is 

present as the most specific classification. 

4 Discussion 

The ultimate test of this approach is in how well 
it will scale up. The linguist may choose to add 
knowledge as it is needed or may prefer to do this 
work in batches. To support the batch approach, 
it may be useful to extract detailed subcatcgoriza- 
tion information from English learner's dictionaries. 
Also it will be necessary to decide what semantic 
features are needed to restrict the fillers of the ar- 
gument structures. Finally, there is the problem of 
collecting complete sets of example sentences for a 
verb. In general, a corpus of tagged sentences is in- 
adequate since it rarely includes negative examples 
and is not guaranteed to exhibit the full range of al- 
ternations. In applications where a domain specific 



corpus is available (e.g. the Kant MT project (Mi- 
tamura et al., 1993| )), the full range of relevant alter- 



nations is more likely. However, the lack of negative 
examples still poses a problem and would require the 
project linguist to create appropriate negative ex- 
amples or manually adjust the class definitions for 
further differentiation. 



While I have focused on a lexical research tool, 
an area I will explore in future work is how clas- 
sification could be used in grammar writing. One 
task for which a terminological language is appro- 
priate is flagging inconsistent rules. When writing 
and maintaining a large grammar, inconsistent rules 
is one type of grammar writing bug that occurs. For 
example, the following three rules are inconsistent 
since featurei of NP and featurei of VP would not 
unify in rule 1 given the values assigned in 2 and 3. 

1) S -> NP VP 

<NP featurei > = <VP featurei > 

2) NP -» det N 

<N featurei > = + 
<NP> = <N> 

3) VP — » V 

<V featurei > = — 
<VP> = <V> 

5 Conclusion 

I have shown how a terminological language, such 
as Classic, can be used to manage lexical seman- 
tics data during analysis with two minor exten- 
sions. First, a test to identify legal-linkings is 
necessary since this cannot be directly expressed 
in the language and second, set membership tests, 
have-instances-of and have-no-instances-of 
are necessary since this type of expressiveness is 
not provided in Classic. While the solution of sev- 
eral knowledge acquisition issues would result in a 
friendlier tool for a linguistics researcher, the tool 
still performs a useful function. 
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